Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: Conan the Librarian Date: Jun 17, 2004 12:37am
Forum: web Subject: Domain Hijacking and Robots.txt

I have encountered a problem where the content of an abandoned domain that has been available here for years is now retroactively blocked by a redirect to a robots.txt file put up by a spammer who buys abandoned domains in order to send popup ads to people who happen in with outdated links, .

This sqatter has no rights to the former content and no rights to block it. I see this as a growing problem to the integrity of the archive if a solution is not found. Hopefully the now unavailable content is only blocked from access and not permanently deleted, so that this problem can be fixed. I would send more specifics by e-mail, if needed.

Reply to this post
Reply [edit]

Poster: ylbissop Date: Sep 2, 2004 7:15pm
Forum: web Subject: Re: Domain Hijacking and Robots.txt

Just an idea would it be possible to index a whois notation with each crawl and when an excluton is found only block pages back to the original date of registration.

so if x.com is owned by bob
then owned by carl(with a robots.txt exclution)
then owned by rob(who asks to be reindexed)
then owned by megacorp(with a robots.txt exclution)
then owned by carl(who asks to be reindexed)

bob and rob's pages will always be available and carls second set of pages will be available too. while megacorps pages and carls original page would not.

This post was modified by ylbissop on 2004-08-27 21:56:38
-ylbissop http://www.ylbissop.com

This post was modified by ylbissop on 2004-09-03 02:15:02

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffmolly Date: Jun 20, 2004 2:37am
Forum: web Subject: Re: Domain Hijacking and Robots.txt

We are aware of this problem and are working on a way to allow time based exclusions. We realize that this problem will continue to grow as the Internet (and our archive) gets older. However, we are a very small non-profit with lots on our plate- please be patient!

Reply to this post
Reply [edit]

Poster: Bill Harris Date: Nov 4, 2004 2:46am
Forum: web Subject: Re: Domain Hijacking and Robots.txt

Thanks for keeping this as a priority item. My Web site pointed to articles I had written for an online publication; when they folded, I changed the links to point to the Wayback Machine. I just discovered the problem last week. While I have the former site owner's permission to display that material, I'm not sure if I can recreate it all.

Just curious: if you do create time-based permissions, is material currently blocked still available on your servers, or did you delete it when the robots.txt file hit? That may help me decide how much effort to put into my restoration work.

Reply to this post
Reply [edit]

Poster: ssybesma Date: Dec 31, 2006 8:47pm
Forum: web Subject: Re: Domain Hijacking and Robots.txt

Yep, thanks for keeping this as a priority also. I think it should be very close to the front burner. This is becoming a worse and worse problem as time goes along and it will affect the usefulness of archive.org, one of my absolute _FAVORITE_ sites on the internet if not my #1 favorite.

I think the argument is summed up like this: You may own the domain name and what's on it _now_, but you didn't always own it and you certainly don't own what used to be on it. You can't have the right to erase history. No one has that right. You can't erase information about previous owners of a house when you buy the property and you don't have a right to the pictures the previous owner took when he owned the house, nor do you own or have any rights to what used to be in his house when he owned it.

Reply to this post
Reply [edit]

Poster: waffffffle Date: Jul 6, 2004 4:25pm
Forum: web Subject: Re: Domain Hijacking and Robots.txt

Would this be the reason why I cannot see archive pages for georgia-avenue.com?

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or Staffsimon c Date: Jul 7, 2004 1:03am
Forum: web Subject: Re: Domain Hijacking and Robots.txt

This appears to be the case, yes:

http://georgia-avenue.com/robots.txt

[User-agent: ia_archiver is the Archive's crawler]

Regards,
s!